Overview

Dataset statistics

Number of variables27
Number of observations99343
Missing cells178495
Missing cells (%)6.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.3 MiB
Average record size in memory130.0 B

Variable types

Categorical19
Numeric8

Alerts

diag_1 has a high cardinality: 715 distinct valuesHigh cardinality
diag_2 has a high cardinality: 747 distinct valuesHigh cardinality
diag_3 has a high cardinality: 786 distinct valuesHigh cardinality
admission_type_id is highly overall correlated with Admission_SourceHigh correlation
max_glu_serum is highly overall correlated with Thiazolidinediones and 2 other fieldsHigh correlation
A1Cresult is highly overall correlated with SulfonylureasHigh correlation
change is highly overall correlated with diabetesMed and 1 other fieldsHigh correlation
diabetesMed is highly overall correlated with change and 1 other fieldsHigh correlation
Thiazolidinediones is highly overall correlated with max_glu_serum and 1 other fieldsHigh correlation
Sulfonylureas is highly overall correlated with max_glu_serum and 1 other fieldsHigh correlation
Metformin is highly overall correlated with max_glu_serum and 1 other fieldsHigh correlation
Insulin is highly overall correlated with change and 1 other fieldsHigh correlation
Admission_Source is highly overall correlated with admission_type_idHigh correlation
discharge_disposition_id is highly imbalanced (55.0%)Imbalance
Meglitinides is highly imbalanced (89.8%)Imbalance
Thiazolidinediones is highly imbalanced (> 99.9%)Imbalance
Sulfonylureas is highly imbalanced (> 99.9%)Imbalance
AG_Inhibitors is highly imbalanced (94.8%)Imbalance
Metformin is highly imbalanced (> 99.9%)Imbalance
diag_3 has 1419 (1.4%) missing valuesMissing
max_glu_serum has 94191 (94.8%) missing valuesMissing
A1Cresult has 82509 (83.1%) missing valuesMissing
number_emergency is highly skewed (γ1 = 22.84877954)Skewed
num_procedures has 45679 (46.0%) zerosZeros
number_outpatient has 82994 (83.5%) zerosZeros
number_emergency has 88249 (88.8%) zerosZeros
number_inpatient has 66245 (66.7%) zerosZeros

Reproduction

Analysis started2023-11-05 16:13:26.461469
Analysis finished2023-11-05 16:13:45.986448
Duration19.52 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

gender
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
Female
53454 
Male
45886 
Missing
 
3

Length

Max length7
Median length6
Mean length5.0762409
Min length4

Characters and Unicode

Total characters504289
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowFemale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Female 53454
53.8%
Male 45886
46.2%
Missing 3
 
< 0.1%

Length

2023-11-05T11:13:46.129571image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:46.353545image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
female 53454
53.8%
male 45886
46.2%
missing 3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 152794
30.3%
a 99340
19.7%
l 99340
19.7%
F 53454
 
10.6%
m 53454
 
10.6%
M 45889
 
9.1%
i 6
 
< 0.1%
s 6
 
< 0.1%
n 3
 
< 0.1%
g 3
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 404946
80.3%
Uppercase Letter 99343
 
19.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 152794
37.7%
a 99340
24.5%
l 99340
24.5%
m 53454
 
13.2%
i 6
 
< 0.1%
s 6
 
< 0.1%
n 3
 
< 0.1%
g 3
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
F 53454
53.8%
M 45889
46.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 504289
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 152794
30.3%
a 99340
19.7%
l 99340
19.7%
F 53454
 
10.6%
m 53454
 
10.6%
M 45889
 
9.1%
i 6
 
< 0.1%
s 6
 
< 0.1%
n 3
 
< 0.1%
g 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 504289
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 152794
30.3%
a 99340
19.7%
l 99340
19.7%
F 53454
 
10.6%
m 53454
 
10.6%
M 45889
 
9.1%
i 6
 
< 0.1%
s 6
 
< 0.1%
n 3
 
< 0.1%
g 3
 
< 0.1%

age
Categorical

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
[70-80)
25331 
[60-70)
22059 
[50-60)
17060 
[80-90)
16434 
[40-50)
9607 
Other values (5)
8852 

Length

Max length8
Median length7
Mean length7.0244506
Min length6

Characters and Unicode

Total characters697830
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row[0-10)
2nd row[10-20)
3rd row[20-30)
4th row[30-40)
5th row[40-50)

Common Values

ValueCountFrequency (%)
[70-80) 25331
25.5%
[60-70) 22059
22.2%
[50-60) 17060
17.2%
[80-90) 16434
16.5%
[40-50) 9607
 
9.7%
[30-40) 3764
 
3.8%
[90-100) 2589
 
2.6%
[20-30) 1649
 
1.7%
[10-20) 690
 
0.7%
[0-10) 160
 
0.2%

Length

2023-11-05T11:13:46.524281image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:46.714926image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
70-80 25331
25.5%
60-70 22059
22.2%
50-60 17060
17.2%
80-90 16434
16.5%
40-50 9607
 
9.7%
30-40 3764
 
3.8%
90-100 2589
 
2.6%
20-30 1649
 
1.7%
10-20 690
 
0.7%
0-10 160
 
0.2%

Most occurring characters

ValueCountFrequency (%)
0 201275
28.8%
[ 99343
14.2%
- 99343
14.2%
) 99343
14.2%
7 47390
 
6.8%
8 41765
 
6.0%
6 39119
 
5.6%
5 26667
 
3.8%
9 19023
 
2.7%
4 13371
 
1.9%
Other values (3) 11191
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 399801
57.3%
Open Punctuation 99343
 
14.2%
Dash Punctuation 99343
 
14.2%
Close Punctuation 99343
 
14.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 201275
50.3%
7 47390
 
11.9%
8 41765
 
10.4%
6 39119
 
9.8%
5 26667
 
6.7%
9 19023
 
4.8%
4 13371
 
3.3%
3 5413
 
1.4%
1 3439
 
0.9%
2 2339
 
0.6%
Open Punctuation
ValueCountFrequency (%)
[ 99343
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 99343
100.0%
Close Punctuation
ValueCountFrequency (%)
) 99343
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 697830
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 201275
28.8%
[ 99343
14.2%
- 99343
14.2%
) 99343
14.2%
7 47390
 
6.8%
8 41765
 
6.0%
6 39119
 
5.6%
5 26667
 
3.8%
9 19023
 
2.7%
4 13371
 
1.9%
Other values (3) 11191
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 697830
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 201275
28.8%
[ 99343
14.2%
- 99343
14.2%
) 99343
14.2%
7 47390
 
6.8%
8 41765
 
6.0%
6 39119
 
5.6%
5 26667
 
3.8%
9 19023
 
2.7%
4 13371
 
1.9%
Other values (3) 11191
 
1.6%

admission_type_id
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
Emergency
52371 
Elective
18668 
Urgent
18132 
Missing
10154 
Trauma Centre
 
18

Length

Max length13
Median length9
Mean length8.0608297
Min length6

Characters and Unicode

Total characters800787
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMissing
2nd rowEmergency
3rd rowEmergency
4th rowEmergency
5th rowEmergency

Common Values

ValueCountFrequency (%)
Emergency 52371
52.7%
Elective 18668
 
18.8%
Urgent 18132
 
18.3%
Missing 10154
 
10.2%
Trauma Centre 18
 
< 0.1%

Length

2023-11-05T11:13:47.105719image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:47.320875image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
emergency 52371
52.7%
elective 18668
 
18.8%
urgent 18132
 
18.2%
missing 10154
 
10.2%
trauma 18
 
< 0.1%
centre 18
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 160246
20.0%
n 80675
10.1%
g 80657
10.1%
E 71039
8.9%
c 71039
8.9%
r 70539
8.8%
m 52389
 
6.5%
y 52371
 
6.5%
i 38976
 
4.9%
t 36818
 
4.6%
Other values (10) 86038
10.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 701408
87.6%
Uppercase Letter 99361
 
12.4%
Space Separator 18
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 160246
22.8%
n 80675
11.5%
g 80657
11.5%
c 71039
10.1%
r 70539
10.1%
m 52389
 
7.5%
y 52371
 
7.5%
i 38976
 
5.6%
t 36818
 
5.2%
s 20308
 
2.9%
Other values (4) 37390
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
E 71039
71.5%
U 18132
 
18.2%
M 10154
 
10.2%
T 18
 
< 0.1%
C 18
 
< 0.1%
Space Separator
ValueCountFrequency (%)
18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 800769
> 99.9%
Common 18
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 160246
20.0%
n 80675
10.1%
g 80657
10.1%
E 71039
8.9%
c 71039
8.9%
r 70539
8.8%
m 52389
 
6.5%
y 52371
 
6.5%
i 38976
 
4.9%
t 36818
 
4.6%
Other values (9) 86020
10.7%
Common
ValueCountFrequency (%)
18
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800787
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 160246
20.0%
n 80675
10.1%
g 80657
10.1%
E 71039
8.9%
c 71039
8.9%
r 70539
8.8%
m 52389
 
6.5%
y 52371
 
6.5%
i 38976
 
4.9%
t 36818
 
4.6%
Other values (10) 86038
10.7%

discharge_disposition_id
Categorical

IMBALANCE 

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
1
60234 
3
13954 
6
12902 
999
 
4686
2
 
2128
Other values (14)
 
5439

Length

Max length3
Median length1
Mean length1.1213976
Min length1

Characters and Unicode

Total characters111403
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row999
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 60234
60.6%
3 13954
 
14.0%
6 12902
 
13.0%
999 4686
 
4.7%
2 2128
 
2.1%
22 1993
 
2.0%
5 1184
 
1.2%
4 815
 
0.8%
7 623
 
0.6%
23 412
 
0.4%
Other values (9) 412
 
0.4%

Length

2023-11-05T11:13:47.478268image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 60234
60.6%
3 13954
 
14.0%
6 12902
 
13.0%
999 4686
 
4.7%
2 2128
 
2.1%
22 1993
 
2.0%
5 1184
 
1.2%
4 815
 
0.8%
7 623
 
0.6%
23 412
 
0.4%
Other values (9) 412
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1 60325
54.2%
3 14366
 
12.9%
9 14079
 
12.6%
6 12913
 
11.6%
2 6721
 
6.0%
5 1247
 
1.1%
4 863
 
0.8%
7 642
 
0.6%
8 247
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 111403
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 60325
54.2%
3 14366
 
12.9%
9 14079
 
12.6%
6 12913
 
11.6%
2 6721
 
6.0%
5 1247
 
1.1%
4 863
 
0.8%
7 642
 
0.6%
8 247
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 111403
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 60325
54.2%
3 14366
 
12.9%
9 14079
 
12.6%
6 12913
 
11.6%
2 6721
 
6.0%
5 1247
 
1.1%
4 863
 
0.8%
7 642
 
0.6%
8 247
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 111403
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 60325
54.2%
3 14366
 
12.9%
9 14079
 
12.6%
6 12913
 
11.6%
2 6721
 
6.0%
5 1247
 
1.1%
4 863
 
0.8%
7 642
 
0.6%
8 247
 
0.2%

time_in_hospital
Real number (ℝ)

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.3793322
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:47.661429image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile11
Maximum14
Range13
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.968409
Coefficient of variation (CV)0.67782229
Kurtosis0.88755278
Mean4.3793322
Median Absolute Deviation (MAD)2
Skewness1.1418204
Sum435056
Variance8.8114518
MonotonicityNot monotonic
2023-11-05T11:13:47.870312image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
3 17432
17.5%
2 16891
17.0%
1 13824
13.9%
4 13684
13.8%
5 9749
9.8%
6 7355
7.4%
7 5696
 
5.7%
8 4271
 
4.3%
9 2879
 
2.9%
10 2262
 
2.3%
Other values (4) 5300
 
5.3%
ValueCountFrequency (%)
1 13824
13.9%
2 16891
17.0%
3 17432
17.5%
4 13684
13.8%
5 9749
9.8%
6 7355
7.4%
7 5696
 
5.7%
8 4271
 
4.3%
9 2879
 
2.9%
10 2262
 
2.3%
ValueCountFrequency (%)
14 995
 
1.0%
13 1152
 
1.2%
12 1383
 
1.4%
11 1770
 
1.8%
10 2262
 
2.3%
9 2879
 
2.9%
8 4271
4.3%
7 5696
5.7%
6 7355
7.4%
5 9749
9.8%

num_lab_procedures
Real number (ℝ)

Distinct118
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.906929
Minimum1
Maximum132
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:48.020758image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q131
median44
Q357
95-th percentile73
Maximum132
Range131
Interquartile range (IQR)26

Descriptive statistics

Standard deviation19.610032
Coefficient of variation (CV)0.45703649
Kurtosis-0.25299214
Mean42.906929
Median Absolute Deviation (MAD)13
Skewness-0.24153505
Sum4262503
Variance384.55336
MonotonicityNot monotonic
2023-11-05T11:13:48.211103image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3153
 
3.2%
43 2752
 
2.8%
44 2447
 
2.5%
45 2331
 
2.3%
38 2180
 
2.2%
40 2163
 
2.2%
46 2142
 
2.2%
41 2082
 
2.1%
42 2066
 
2.1%
39 2061
 
2.1%
Other values (108) 75966
76.5%
ValueCountFrequency (%)
1 3153
3.2%
2 1089
 
1.1%
3 662
 
0.7%
4 377
 
0.4%
5 285
 
0.3%
6 278
 
0.3%
7 321
 
0.3%
8 361
 
0.4%
9 923
 
0.9%
10 828
 
0.8%
ValueCountFrequency (%)
132 1
 
< 0.1%
129 1
 
< 0.1%
126 1
 
< 0.1%
121 1
 
< 0.1%
120 1
 
< 0.1%
118 1
 
< 0.1%
114 1
 
< 0.1%
113 3
< 0.1%
111 2
< 0.1%
109 3
< 0.1%

num_procedures
Real number (ℝ)

ZEROS 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3342359
Minimum0
Maximum6
Zeros45679
Zeros (%)46.0%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:48.379980image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.7027856
Coefficient of variation (CV)1.2762253
Kurtosis0.87667601
Mean1.3342359
Median Absolute Deviation (MAD)1
Skewness1.3221628
Sum132547
Variance2.8994789
MonotonicityNot monotonic
2023-11-05T11:13:48.547130image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 45679
46.0%
1 20250
20.4%
2 12373
 
12.5%
3 9203
 
9.3%
6 4801
 
4.8%
4 4049
 
4.1%
5 2988
 
3.0%
ValueCountFrequency (%)
0 45679
46.0%
1 20250
20.4%
2 12373
 
12.5%
3 9203
 
9.3%
4 4049
 
4.1%
5 2988
 
3.0%
6 4801
 
4.8%
ValueCountFrequency (%)
6 4801
 
4.8%
5 2988
 
3.0%
4 4049
 
4.1%
3 9203
 
9.3%
2 12373
 
12.5%
1 20250
20.4%
0 45679
46.0%

num_medications
Real number (ℝ)

Distinct75
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.979062
Minimum1
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:48.712310image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q110
median15
Q320
95-th percentile31
Maximum81
Range80
Interquartile range (IQR)10

Descriptive statistics

Standard deviation8.0949093
Coefficient of variation (CV)0.50659476
Kurtosis3.5330492
Mean15.979062
Median Absolute Deviation (MAD)5
Skewness1.3353622
Sum1587408
Variance65.527556
MonotonicityNot monotonic
2023-11-05T11:13:48.959395image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 5976
 
6.0%
12 5888
 
5.9%
11 5696
 
5.7%
15 5694
 
5.7%
14 5592
 
5.6%
16 5324
 
5.4%
10 5245
 
5.3%
9 4830
 
4.9%
17 4801
 
4.8%
18 4405
 
4.4%
Other values (65) 45892
46.2%
ValueCountFrequency (%)
1 260
 
0.3%
2 457
 
0.5%
3 874
 
0.9%
4 1383
 
1.4%
5 1964
 
2.0%
6 2634
2.7%
7 3421
3.4%
8 4244
4.3%
9 4830
4.9%
10 5245
5.3%
ValueCountFrequency (%)
81 1
 
< 0.1%
79 1
 
< 0.1%
75 2
 
< 0.1%
74 1
 
< 0.1%
72 3
< 0.1%
70 2
 
< 0.1%
69 5
< 0.1%
68 7
< 0.1%
67 7
< 0.1%
66 5
< 0.1%

number_outpatient
Real number (ℝ)

ZEROS 

Distinct39
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.36924595
Minimum0
Maximum42
Zeros82994
Zeros (%)83.5%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:49.156913image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum42
Range42
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.2651423
Coefficient of variation (CV)3.4262862
Kurtosis149.04706
Mean0.36924595
Median Absolute Deviation (MAD)0
Skewness8.8396687
Sum36682
Variance1.6005851
MonotonicityNot monotonic
2023-11-05T11:13:49.336173image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
0 82994
83.5%
1 8349
 
8.4%
2 3509
 
3.5%
3 1997
 
2.0%
4 1077
 
1.1%
5 514
 
0.5%
6 294
 
0.3%
7 154
 
0.2%
8 96
 
0.1%
9 82
 
0.1%
Other values (29) 277
 
0.3%
ValueCountFrequency (%)
0 82994
83.5%
1 8349
 
8.4%
2 3509
 
3.5%
3 1997
 
2.0%
4 1077
 
1.1%
5 514
 
0.5%
6 294
 
0.3%
7 154
 
0.2%
8 96
 
0.1%
9 82
 
0.1%
ValueCountFrequency (%)
42 1
< 0.1%
40 1
< 0.1%
39 1
< 0.1%
38 1
< 0.1%
37 1
< 0.1%
36 2
< 0.1%
35 2
< 0.1%
34 1
< 0.1%
33 2
< 0.1%
29 2
< 0.1%

number_emergency
Real number (ℝ)

SKEWED  ZEROS 

Distinct33
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.19844378
Minimum0
Maximum76
Zeros88249
Zeros (%)88.8%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:49.471148image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum76
Range76
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.93773391
Coefficient of variation (CV)4.7254388
Kurtosis1183.2035
Mean0.19844378
Median Absolute Deviation (MAD)0
Skewness22.84878
Sum19714
Variance0.87934488
MonotonicityNot monotonic
2023-11-05T11:13:49.624721image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
0 88249
88.8%
1 7474
 
7.5%
2 1984
 
2.0%
3 706
 
0.7%
4 369
 
0.4%
5 190
 
0.2%
6 93
 
0.1%
7 72
 
0.1%
8 50
 
0.1%
10 34
 
< 0.1%
Other values (23) 122
 
0.1%
ValueCountFrequency (%)
0 88249
88.8%
1 7474
 
7.5%
2 1984
 
2.0%
3 706
 
0.7%
4 369
 
0.4%
5 190
 
0.2%
6 93
 
0.1%
7 72
 
0.1%
8 50
 
0.1%
9 33
 
< 0.1%
ValueCountFrequency (%)
76 1
< 0.1%
64 1
< 0.1%
63 1
< 0.1%
54 1
< 0.1%
46 1
< 0.1%
42 1
< 0.1%
37 1
< 0.1%
29 1
< 0.1%
28 1
< 0.1%
25 2
< 0.1%

number_inpatient
Real number (ℝ)

ZEROS 

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.63093524
Minimum0
Maximum21
Zeros66245
Zeros (%)66.7%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:49.797073image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum21
Range21
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.2604283
Coefficient of variation (CV)1.9977143
Kurtosis20.913218
Mean0.63093524
Median Absolute Deviation (MAD)0
Skewness3.6334201
Sum62679
Variance1.5886796
MonotonicityNot monotonic
2023-11-05T11:13:49.984771image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
0 66245
66.7%
1 18984
 
19.1%
2 7300
 
7.3%
3 3271
 
3.3%
4 1574
 
1.6%
5 791
 
0.8%
6 474
 
0.5%
7 262
 
0.3%
8 145
 
0.1%
9 109
 
0.1%
Other values (11) 188
 
0.2%
ValueCountFrequency (%)
0 66245
66.7%
1 18984
 
19.1%
2 7300
 
7.3%
3 3271
 
3.3%
4 1574
 
1.6%
5 791
 
0.8%
6 474
 
0.5%
7 262
 
0.3%
8 145
 
0.1%
9 109
 
0.1%
ValueCountFrequency (%)
21 1
 
< 0.1%
19 2
 
< 0.1%
18 1
 
< 0.1%
17 1
 
< 0.1%
16 6
 
< 0.1%
15 9
 
< 0.1%
14 10
 
< 0.1%
13 18
 
< 0.1%
12 32
< 0.1%
11 49
< 0.1%

diag_1
Categorical

HIGH CARDINALITY 

Distinct715
Distinct (%)0.7%
Missing20
Missing (%)< 0.1%
Memory size3.0 MiB
428
 
6663
414
 
6550
786
 
4015
410
 
3448
486
 
3383
Other values (710)
75264 

Length

Max length6
Median length3
Mean length3.1799482
Min length1

Characters and Unicode

Total characters315842
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique84 ?
Unique (%)0.1%

Sample

1st row250.83
2nd row276
3rd row648
4th row8
5th row197

Common Values

ValueCountFrequency (%)
428 6663
 
6.7%
414 6550
 
6.6%
786 4015
 
4.0%
410 3448
 
3.5%
486 3383
 
3.4%
427 2720
 
2.7%
491 2240
 
2.3%
715 2147
 
2.2%
682 2029
 
2.0%
780 2004
 
2.0%
Other values (705) 64124
64.5%

Length

2023-11-05T11:13:50.127146image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
428 6663
 
6.7%
414 6550
 
6.6%
786 4015
 
4.0%
410 3448
 
3.5%
486 3383
 
3.4%
427 2720
 
2.7%
491 2240
 
2.3%
715 2147
 
2.2%
682 2029
 
2.0%
780 2004
 
2.0%
Other values (705) 64124
64.6%

Most occurring characters

ValueCountFrequency (%)
4 54208
17.2%
2 39161
12.4%
8 36879
11.7%
5 36244
11.5%
7 28152
8.9%
1 27218
8.6%
0 24430
7.7%
6 22832
7.2%
9 19605
 
6.2%
3 17054
 
5.4%
Other values (3) 10059
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 305783
96.8%
Other Punctuation 8426
 
2.7%
Uppercase Letter 1633
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 54208
17.7%
2 39161
12.8%
8 36879
12.1%
5 36244
11.9%
7 28152
9.2%
1 27218
8.9%
0 24430
8.0%
6 22832
7.5%
9 19605
 
6.4%
3 17054
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
V 1632
99.9%
E 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 8426
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 314209
99.5%
Latin 1633
 
0.5%

Most frequent character per script

Common
ValueCountFrequency (%)
4 54208
17.3%
2 39161
12.5%
8 36879
11.7%
5 36244
11.5%
7 28152
9.0%
1 27218
8.7%
0 24430
7.8%
6 22832
7.3%
9 19605
 
6.2%
3 17054
 
5.4%
Latin
ValueCountFrequency (%)
V 1632
99.9%
E 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 315842
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 54208
17.2%
2 39161
12.4%
8 36879
11.7%
5 36244
11.5%
7 28152
8.9%
1 27218
8.6%
0 24430
7.7%
6 22832
7.2%
9 19605
 
6.2%
3 17054
 
5.4%
Other values (3) 10059
 
3.2%

diag_2
Categorical

HIGH CARDINALITY 

Distinct747
Distinct (%)0.8%
Missing356
Missing (%)0.4%
Memory size3.0 MiB
276
 
6589
428
 
6459
250
 
6051
427
 
4892
401
 
3722
Other values (742)
71274 

Length

Max length6
Median length3
Mean length3.1768313
Min length1

Characters and Unicode

Total characters314465
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique125 ?
Unique (%)0.1%

Sample

1st row250.01
2nd row250
3rd row250.43
4th row157
5th row411

Common Values

ValueCountFrequency (%)
276 6589
 
6.6%
428 6459
 
6.5%
250 6051
 
6.1%
427 4892
 
4.9%
401 3722
 
3.7%
496 3246
 
3.3%
599 3212
 
3.2%
403 2743
 
2.8%
414 2642
 
2.7%
411 2551
 
2.6%
Other values (737) 56880
57.3%

Length

2023-11-05T11:13:50.310880image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
276 6589
 
6.7%
428 6459
 
6.5%
250 6051
 
6.1%
427 4892
 
4.9%
401 3722
 
3.8%
496 3246
 
3.3%
599 3212
 
3.2%
403 2743
 
2.8%
414 2642
 
2.7%
411 2551
 
2.6%
Other values (737) 56880
57.5%

Most occurring characters

ValueCountFrequency (%)
4 49980
15.9%
2 48832
15.5%
5 37253
11.8%
0 33564
10.7%
7 27900
8.9%
8 27699
8.8%
1 25444
8.1%
9 21322
6.8%
6 19505
 
6.2%
3 13798
 
4.4%
Other values (3) 9168
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 305297
97.1%
Other Punctuation 6654
 
2.1%
Uppercase Letter 2514
 
0.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 49980
16.4%
2 48832
16.0%
5 37253
12.2%
0 33564
11.0%
7 27900
9.1%
8 27699
9.1%
1 25444
8.3%
9 21322
7.0%
6 19505
 
6.4%
3 13798
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
V 1787
71.1%
E 727
28.9%
Other Punctuation
ValueCountFrequency (%)
. 6654
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 311951
99.2%
Latin 2514
 
0.8%

Most frequent character per script

Common
ValueCountFrequency (%)
4 49980
16.0%
2 48832
15.7%
5 37253
11.9%
0 33564
10.8%
7 27900
8.9%
8 27699
8.9%
1 25444
8.2%
9 21322
6.8%
6 19505
 
6.3%
3 13798
 
4.4%
Latin
ValueCountFrequency (%)
V 1787
71.1%
E 727
28.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 314465
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 49980
15.9%
2 48832
15.5%
5 37253
11.8%
0 33564
10.7%
7 27900
8.9%
8 27699
8.8%
1 25444
8.1%
9 21322
6.8%
6 19505
 
6.2%
3 13798
 
4.4%
Other values (3) 9168
 
2.9%

diag_3
Categorical

HIGH CARDINALITY  MISSING 

Distinct786
Distinct (%)0.8%
Missing1419
Missing (%)1.4%
Memory size3.0 MiB
250
11466 
401
8241 
276
 
4953
428
 
4412
427
 
3785
Other values (781)
65067 

Length

Max length6
Median length3
Mean length3.1429884
Min length1

Characters and Unicode

Total characters307774
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique122 ?
Unique (%)0.1%

Sample

1st row255
2nd rowV27
3rd row403
4th row250
5th row250

Common Values

ValueCountFrequency (%)
250 11466
 
11.5%
401 8241
 
8.3%
276 4953
 
5.0%
428 4412
 
4.4%
427 3785
 
3.8%
414 3635
 
3.7%
496 2504
 
2.5%
403 2277
 
2.3%
272 1966
 
2.0%
585 1930
 
1.9%
Other values (776) 52755
53.1%

Length

2023-11-05T11:13:50.495472image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
250 11466
 
11.7%
401 8241
 
8.4%
276 4953
 
5.1%
428 4412
 
4.5%
427 3785
 
3.9%
414 3635
 
3.7%
496 2504
 
2.6%
403 2277
 
2.3%
272 1966
 
2.0%
585 1930
 
2.0%
Other values (776) 52755
53.9%

Most occurring characters

ValueCountFrequency (%)
2 50140
16.3%
4 48102
15.6%
5 40357
13.1%
0 39153
12.7%
7 25724
8.4%
1 24119
7.8%
8 23018
7.5%
9 16711
 
5.4%
6 15899
 
5.2%
3 14018
 
4.6%
Other values (3) 10533
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 297241
96.6%
Other Punctuation 5514
 
1.8%
Uppercase Letter 5019
 
1.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 50140
16.9%
4 48102
16.2%
5 40357
13.6%
0 39153
13.2%
7 25724
8.7%
1 24119
8.1%
8 23018
7.7%
9 16711
 
5.6%
6 15899
 
5.3%
3 14018
 
4.7%
Uppercase Letter
ValueCountFrequency (%)
V 3782
75.4%
E 1237
 
24.6%
Other Punctuation
ValueCountFrequency (%)
. 5514
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 302755
98.4%
Latin 5019
 
1.6%

Most frequent character per script

Common
ValueCountFrequency (%)
2 50140
16.6%
4 48102
15.9%
5 40357
13.3%
0 39153
12.9%
7 25724
8.5%
1 24119
8.0%
8 23018
7.6%
9 16711
 
5.5%
6 15899
 
5.3%
3 14018
 
4.6%
Latin
ValueCountFrequency (%)
V 3782
75.4%
E 1237
 
24.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 307774
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 50140
16.3%
4 48102
15.6%
5 40357
13.1%
0 39153
12.7%
7 25724
8.4%
1 24119
7.8%
8 23018
7.5%
9 16711
 
5.4%
6 15899
 
5.2%
3 14018
 
4.6%
Other values (3) 10533
 
3.4%

number_diagnoses
Real number (ℝ)

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.4017092
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 MiB
2023-11-05T11:13:50.664600image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q16
median8
Q39
95-th percentile9
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.9410127
Coefficient of variation (CV)0.26223844
Kurtosis-0.12223322
Mean7.4017092
Median Absolute Deviation (MAD)1
Skewness-0.8614325
Sum735308
Variance3.7675302
MonotonicityNot monotonic
2023-11-05T11:13:50.826142image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
9 47810
48.1%
5 11301
 
11.4%
8 10359
 
10.4%
7 10215
 
10.3%
6 9986
 
10.1%
4 5499
 
5.5%
3 2824
 
2.8%
2 1021
 
1.0%
1 219
 
0.2%
16 42
 
< 0.1%
Other values (6) 67
 
0.1%
ValueCountFrequency (%)
1 219
 
0.2%
2 1021
 
1.0%
3 2824
 
2.8%
4 5499
 
5.5%
5 11301
 
11.4%
6 9986
 
10.1%
7 10215
 
10.3%
8 10359
 
10.4%
9 47810
48.1%
10 16
 
< 0.1%
ValueCountFrequency (%)
16 42
 
< 0.1%
15 10
 
< 0.1%
14 6
 
< 0.1%
13 16
 
< 0.1%
12 8
 
< 0.1%
11 11
 
< 0.1%
10 16
 
< 0.1%
9 47810
48.1%
8 10359
 
10.4%
7 10215
 
10.3%

max_glu_serum
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)0.1%
Missing94191
Missing (%)94.8%
Memory size3.5 MiB
Norm
2545 
>200
1419 
>300
1188 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters20608
Distinct characters8
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row>300
2nd row>300
3rd rowNorm
4th rowNorm
5th rowNorm

Common Values

ValueCountFrequency (%)
Norm 2545
 
2.6%
>200 1419
 
1.4%
>300 1188
 
1.2%
(Missing) 94191
94.8%

Length

2023-11-05T11:13:51.033083image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:51.208809image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
norm 2545
49.4%
200 1419
27.5%
300 1188
23.1%

Most occurring characters

ValueCountFrequency (%)
0 5214
25.3%
> 2607
12.7%
N 2545
12.3%
o 2545
12.3%
r 2545
12.3%
m 2545
12.3%
2 1419
 
6.9%
3 1188
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7821
38.0%
Lowercase Letter 7635
37.0%
Math Symbol 2607
 
12.7%
Uppercase Letter 2545
 
12.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5214
66.7%
2 1419
 
18.1%
3 1188
 
15.2%
Lowercase Letter
ValueCountFrequency (%)
o 2545
33.3%
r 2545
33.3%
m 2545
33.3%
Math Symbol
ValueCountFrequency (%)
> 2607
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 2545
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10428
50.6%
Latin 10180
49.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5214
50.0%
> 2607
25.0%
2 1419
 
13.6%
3 1188
 
11.4%
Latin
ValueCountFrequency (%)
N 2545
25.0%
o 2545
25.0%
r 2545
25.0%
m 2545
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20608
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5214
25.3%
> 2607
12.7%
N 2545
12.3%
o 2545
12.3%
r 2545
12.3%
m 2545
12.3%
2 1419
 
6.9%
3 1188
 
5.8%

A1Cresult
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing82509
Missing (%)83.1%
Memory size3.5 MiB
>8
8137 
Norm
4922 
>7
3775 

Length

Max length4
Median length2
Mean length2.5847689
Min length2

Characters and Unicode

Total characters43512
Distinct characters7
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row>7
2nd row>7
3rd row>8
4th rowNorm
5th rowNorm

Common Values

ValueCountFrequency (%)
>8 8137
 
8.2%
Norm 4922
 
5.0%
>7 3775
 
3.8%
(Missing) 82509
83.1%

Length

2023-11-05T11:13:51.376233image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:51.541576image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
8 8137
48.3%
norm 4922
29.2%
7 3775
22.4%

Most occurring characters

ValueCountFrequency (%)
> 11912
27.4%
8 8137
18.7%
N 4922
11.3%
o 4922
11.3%
r 4922
11.3%
m 4922
11.3%
7 3775
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14766
33.9%
Math Symbol 11912
27.4%
Decimal Number 11912
27.4%
Uppercase Letter 4922
 
11.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 4922
33.3%
r 4922
33.3%
m 4922
33.3%
Decimal Number
ValueCountFrequency (%)
8 8137
68.3%
7 3775
31.7%
Math Symbol
ValueCountFrequency (%)
> 11912
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 4922
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23824
54.8%
Latin 19688
45.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 4922
25.0%
o 4922
25.0%
r 4922
25.0%
m 4922
25.0%
Common
ValueCountFrequency (%)
> 11912
50.0%
8 8137
34.2%
7 3775
 
15.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43512
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
> 11912
27.4%
8 8137
18.7%
N 4922
11.3%
o 4922
11.3%
r 4922
11.3%
m 4922
11.3%
7 3775
 
8.7%

change
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
No
53221 
Ch
46122 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters198686
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowCh
3rd rowNo
4th rowCh
5th rowCh

Common Values

ValueCountFrequency (%)
No 53221
53.6%
Ch 46122
46.4%

Length

2023-11-05T11:13:51.708904image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:51.856783image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 53221
53.6%
ch 46122
46.4%

Most occurring characters

ValueCountFrequency (%)
N 53221
26.8%
o 53221
26.8%
C 46122
23.2%
h 46122
23.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 99343
50.0%
Lowercase Letter 99343
50.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 53221
53.6%
C 46122
46.4%
Lowercase Letter
ValueCountFrequency (%)
o 53221
53.6%
h 46122
46.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 198686
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 53221
26.8%
o 53221
26.8%
C 46122
23.2%
h 46122
23.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 198686
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 53221
26.8%
o 53221
26.8%
C 46122
23.2%
h 46122
23.2%

diabetesMed
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
Yes
76719 
No
22624 

Length

Max length3
Median length3
Mean length2.7722638
Min length2

Characters and Unicode

Total characters275405
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowYes
3rd rowYes
4th rowYes
5th rowYes

Common Values

ValueCountFrequency (%)
Yes 76719
77.2%
No 22624
 
22.8%

Length

2023-11-05T11:13:52.002688image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:52.164397image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
yes 76719
77.2%
no 22624
 
22.8%

Most occurring characters

ValueCountFrequency (%)
Y 76719
27.9%
e 76719
27.9%
s 76719
27.9%
N 22624
 
8.2%
o 22624
 
8.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 176062
63.9%
Uppercase Letter 99343
36.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 76719
43.6%
s 76719
43.6%
o 22624
 
12.9%
Uppercase Letter
ValueCountFrequency (%)
Y 76719
77.2%
N 22624
 
22.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 275405
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
Y 76719
27.9%
e 76719
27.9%
s 76719
27.9%
N 22624
 
8.2%
o 22624
 
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 275405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
Y 76719
27.9%
e 76719
27.9%
s 76719
27.9%
N 22624
 
8.2%
o 22624
 
8.2%

readmitted
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
NO
52527 
>30
35502 
<30
11314 

Length

Max length3
Median length2
Mean length2.4712562
Min length2

Characters and Unicode

Total characters245502
Distinct characters6
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNO
2nd row>30
3rd rowNO
4th rowNO
5th rowNO

Common Values

ValueCountFrequency (%)
NO 52527
52.9%
>30 35502
35.7%
<30 11314
 
11.4%

Length

2023-11-05T11:13:52.312420image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:52.460524image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 52527
52.9%
30 46816
47.1%

Most occurring characters

ValueCountFrequency (%)
N 52527
21.4%
O 52527
21.4%
3 46816
19.1%
0 46816
19.1%
> 35502
14.5%
< 11314
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 105054
42.8%
Decimal Number 93632
38.1%
Math Symbol 46816
19.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 52527
50.0%
O 52527
50.0%
Decimal Number
ValueCountFrequency (%)
3 46816
50.0%
0 46816
50.0%
Math Symbol
ValueCountFrequency (%)
> 35502
75.8%
< 11314
 
24.2%

Most occurring scripts

ValueCountFrequency (%)
Common 140448
57.2%
Latin 105054
42.8%

Most frequent character per script

Common
ValueCountFrequency (%)
3 46816
33.3%
0 46816
33.3%
> 35502
25.3%
< 11314
 
8.1%
Latin
ValueCountFrequency (%)
N 52527
50.0%
O 52527
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 245502
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 52527
21.4%
O 52527
21.4%
3 46816
19.1%
0 46816
19.1%
> 35502
14.5%
< 11314
 
4.6%

Meglitinides
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
No
97145 
Steady
 
2013
Adjusted
 
185

Length

Max length8
Median length2
Mean length2.0922259
Min length2

Characters and Unicode

Total characters207848
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 97145
97.8%
Steady 2013
 
2.0%
Adjusted 185
 
0.2%

Length

2023-11-05T11:13:52.662108image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:52.835778image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 97145
97.8%
steady 2013
 
2.0%
adjusted 185
 
0.2%

Most occurring characters

ValueCountFrequency (%)
N 97145
46.7%
o 97145
46.7%
d 2383
 
1.1%
t 2198
 
1.1%
e 2198
 
1.1%
S 2013
 
1.0%
a 2013
 
1.0%
y 2013
 
1.0%
A 185
 
0.1%
j 185
 
0.1%
Other values (2) 370
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 108505
52.2%
Uppercase Letter 99343
47.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 97145
89.5%
d 2383
 
2.2%
t 2198
 
2.0%
e 2198
 
2.0%
a 2013
 
1.9%
y 2013
 
1.9%
j 185
 
0.2%
u 185
 
0.2%
s 185
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
N 97145
97.8%
S 2013
 
2.0%
A 185
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 207848
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 97145
46.7%
o 97145
46.7%
d 2383
 
1.1%
t 2198
 
1.1%
e 2198
 
1.1%
S 2013
 
1.0%
a 2013
 
1.0%
y 2013
 
1.0%
A 185
 
0.1%
j 185
 
0.1%
Other values (2) 370
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 207848
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 97145
46.7%
o 97145
46.7%
d 2383
 
1.1%
t 2198
 
1.1%
e 2198
 
1.1%
S 2013
 
1.0%
a 2013
 
1.0%
y 2013
 
1.0%
A 185
 
0.1%
j 185
 
0.1%
Other values (2) 370
 
0.2%

Thiazolidinediones
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
No
99341 
Steady
 
2

Length

Max length6
Median length2
Mean length2.0000805
Min length2

Characters and Unicode

Total characters198694
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 99341
> 99.9%
Steady 2
 
< 0.1%

Length

2023-11-05T11:13:53.024168image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:53.203597image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 99341
> 99.9%
steady 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 99341
50.0%
o 99341
50.0%
S 2
 
< 0.1%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 99351
50.0%
Uppercase Letter 99343
50.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 99341
> 99.9%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
N 99341
> 99.9%
S 2
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 198694
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 99341
50.0%
o 99341
50.0%
S 2
 
< 0.1%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 198694
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 99341
50.0%
o 99341
50.0%
S 2
 
< 0.1%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%

Sulfonylureas
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
No
99342 
Steady
 
1

Length

Max length6
Median length2
Mean length2.0000403
Min length2

Characters and Unicode

Total characters198690
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 99342
> 99.9%
Steady 1
 
< 0.1%

Length

2023-11-05T11:13:53.335433image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:53.472124image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 99342
> 99.9%
steady 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 99342
50.0%
o 99342
50.0%
S 1
 
< 0.1%
t 1
 
< 0.1%
e 1
 
< 0.1%
a 1
 
< 0.1%
d 1
 
< 0.1%
y 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 99347
50.0%
Uppercase Letter 99343
50.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 99342
> 99.9%
t 1
 
< 0.1%
e 1
 
< 0.1%
a 1
 
< 0.1%
d 1
 
< 0.1%
y 1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
N 99342
> 99.9%
S 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 198690
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 99342
50.0%
o 99342
50.0%
S 1
 
< 0.1%
t 1
 
< 0.1%
e 1
 
< 0.1%
a 1
 
< 0.1%
d 1
 
< 0.1%
y 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 198690
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 99342
50.0%
o 99342
50.0%
S 1
 
< 0.1%
t 1
 
< 0.1%
e 1
 
< 0.1%
a 1
 
< 0.1%
d 1
 
< 0.1%
y 1
 
< 0.1%

AG_Inhibitors
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
No
98357 
Steady
 
944
Adjusted
 
42

Length

Max length8
Median length2
Mean length2.0405464
Min length2

Characters and Unicode

Total characters202714
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 98357
99.0%
Steady 944
 
1.0%
Adjusted 42
 
< 0.1%

Length

2023-11-05T11:13:53.633340image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:53.783247image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 98357
99.0%
steady 944
 
1.0%
adjusted 42
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 98357
48.5%
o 98357
48.5%
d 1028
 
0.5%
t 986
 
0.5%
e 986
 
0.5%
S 944
 
0.5%
a 944
 
0.5%
y 944
 
0.5%
A 42
 
< 0.1%
j 42
 
< 0.1%
Other values (2) 84
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 103371
51.0%
Uppercase Letter 99343
49.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 98357
95.1%
d 1028
 
1.0%
t 986
 
1.0%
e 986
 
1.0%
a 944
 
0.9%
y 944
 
0.9%
j 42
 
< 0.1%
u 42
 
< 0.1%
s 42
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
N 98357
99.0%
S 944
 
1.0%
A 42
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 202714
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 98357
48.5%
o 98357
48.5%
d 1028
 
0.5%
t 986
 
0.5%
e 986
 
0.5%
S 944
 
0.5%
a 944
 
0.5%
y 944
 
0.5%
A 42
 
< 0.1%
j 42
 
< 0.1%
Other values (2) 84
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 202714
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 98357
48.5%
o 98357
48.5%
d 1028
 
0.5%
t 986
 
0.5%
e 986
 
0.5%
S 944
 
0.5%
a 944
 
0.5%
y 944
 
0.5%
A 42
 
< 0.1%
j 42
 
< 0.1%
Other values (2) 84
 
< 0.1%

Metformin
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
No
99341 
Steady
 
2

Length

Max length6
Median length2
Mean length2.0000805
Min length2

Characters and Unicode

Total characters198694
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 99341
> 99.9%
Steady 2
 
< 0.1%

Length

2023-11-05T11:13:53.963021image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:54.123456image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 99341
> 99.9%
steady 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 99341
50.0%
o 99341
50.0%
S 2
 
< 0.1%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 99351
50.0%
Uppercase Letter 99343
50.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 99341
> 99.9%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
N 99341
> 99.9%
S 2
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 198694
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 99341
50.0%
o 99341
50.0%
S 2
 
< 0.1%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 198694
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 99341
50.0%
o 99341
50.0%
S 2
 
< 0.1%
t 2
 
< 0.1%
e 2
 
< 0.1%
a 2
 
< 0.1%
d 2
 
< 0.1%
y 2
 
< 0.1%

Insulin
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
No
46379 
Steady
30069 
Adjusted
22895 

Length

Max length8
Median length6
Mean length4.5934993
Min length2

Characters and Unicode

Total characters456332
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowAdjusted
3rd rowNo
4th rowAdjusted
5th rowSteady

Common Values

ValueCountFrequency (%)
No 46379
46.7%
Steady 30069
30.3%
Adjusted 22895
23.0%

Length

2023-11-05T11:13:54.281539image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:54.470342image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
no 46379
46.7%
steady 30069
30.3%
adjusted 22895
23.0%

Most occurring characters

ValueCountFrequency (%)
d 75859
16.6%
t 52964
11.6%
e 52964
11.6%
N 46379
10.2%
o 46379
10.2%
S 30069
 
6.6%
a 30069
 
6.6%
y 30069
 
6.6%
A 22895
 
5.0%
j 22895
 
5.0%
Other values (2) 45790
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 356989
78.2%
Uppercase Letter 99343
 
21.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d 75859
21.2%
t 52964
14.8%
e 52964
14.8%
o 46379
13.0%
a 30069
 
8.4%
y 30069
 
8.4%
j 22895
 
6.4%
u 22895
 
6.4%
s 22895
 
6.4%
Uppercase Letter
ValueCountFrequency (%)
N 46379
46.7%
S 30069
30.3%
A 22895
23.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 456332
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
d 75859
16.6%
t 52964
11.6%
e 52964
11.6%
N 46379
10.2%
o 46379
10.2%
S 30069
 
6.6%
a 30069
 
6.6%
y 30069
 
6.6%
A 22895
 
5.0%
j 22895
 
5.0%
Other values (2) 45790
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 456332
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
d 75859
16.6%
t 52964
11.6%
e 52964
11.6%
N 46379
10.2%
o 46379
10.2%
S 30069
 
6.6%
a 30069
 
6.6%
y 30069
 
6.6%
A 22895
 
5.0%
j 22895
 
5.0%
Other values (2) 45790
10.0%

Admission_Source
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
Emerg_Dept
55850 
Referral
30434 
Missing
6858 
Transfer
6185 
Other
 
16

Length

Max length10
Median length10
Mean length9.0548705
Min length5

Characters and Unicode

Total characters899538
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowReferral
2nd rowEmerg_Dept
3rd rowEmerg_Dept
4th rowEmerg_Dept
5th rowEmerg_Dept

Common Values

ValueCountFrequency (%)
Emerg_Dept 55850
56.2%
Referral 30434
30.6%
Missing 6858
 
6.9%
Transfer 6185
 
6.2%
Other 16
 
< 0.1%

Length

2023-11-05T11:13:54.648154image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-05T11:13:54.845833image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
emerg_dept 55850
56.2%
referral 30434
30.6%
missing 6858
 
6.9%
transfer 6185
 
6.2%
other 16
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 178769
19.9%
r 129104
14.4%
g 62708
 
7.0%
t 55866
 
6.2%
E 55850
 
6.2%
m 55850
 
6.2%
_ 55850
 
6.2%
D 55850
 
6.2%
p 55850
 
6.2%
a 36619
 
4.1%
Other values (10) 157222
17.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 688495
76.5%
Uppercase Letter 155193
 
17.3%
Connector Punctuation 55850
 
6.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 178769
26.0%
r 129104
18.8%
g 62708
 
9.1%
t 55866
 
8.1%
m 55850
 
8.1%
p 55850
 
8.1%
a 36619
 
5.3%
f 36619
 
5.3%
l 30434
 
4.4%
s 19901
 
2.9%
Other values (3) 26775
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
E 55850
36.0%
D 55850
36.0%
R 30434
19.6%
M 6858
 
4.4%
T 6185
 
4.0%
O 16
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 55850
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 843688
93.8%
Common 55850
 
6.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 178769
21.2%
r 129104
15.3%
g 62708
 
7.4%
t 55866
 
6.6%
E 55850
 
6.6%
m 55850
 
6.6%
D 55850
 
6.6%
p 55850
 
6.6%
a 36619
 
4.3%
f 36619
 
4.3%
Other values (9) 120603
14.3%
Common
ValueCountFrequency (%)
_ 55850
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 899538
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 178769
19.9%
r 129104
14.4%
g 62708
 
7.0%
t 55866
 
6.2%
E 55850
 
6.2%
m 55850
 
6.2%
_ 55850
 
6.2%
D 55850
 
6.2%
p 55850
 
6.2%
a 36619
 
4.1%
Other values (10) 157222
17.5%

Interactions

2023-11-05T11:13:42.964653image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:32.812939image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:34.192713image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:35.563523image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:37.133010image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:38.683043image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:40.083235image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:41.552500image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:43.142039image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:32.983353image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:34.370092image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:35.744129image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:37.327080image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:38.872878image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:40.255917image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:41.742961image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:43.333213image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:33.155570image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:34.554738image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:35.901578image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:37.492051image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:39.096244image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:40.433867image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:41.948778image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:43.532291image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:33.319461image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:34.713001image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:36.059943image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:37.680203image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:39.247044image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:40.581562image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:42.119381image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:43.711638image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:33.520725image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:34.879833image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:36.225412image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:37.864810image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:39.435277image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:40.781488image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:42.284524image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:43.877441image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:33.683132image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:35.070283image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:36.417634image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:38.049150image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:39.608076image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:40.992371image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:42.462783image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:44.081686image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:33.847720image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:35.233114image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:36.651739image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:38.284209image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:39.765016image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:41.172542image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:42.652472image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:44.245873image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:34.006931image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:35.396514image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:36.815972image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:38.436834image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:39.922702image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:41.350094image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-05T11:13:42.795284image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-11-05T11:13:55.012220image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
time_in_hospitalnum_lab_proceduresnum_proceduresnum_medicationsnumber_outpatientnumber_emergencynumber_inpatientnumber_diagnosesgenderageadmission_type_iddischarge_disposition_idmax_glu_serumA1CresultchangediabetesMedreadmittedMeglitinidesThiazolidinedionesSulfonylureasAG_InhibitorsMetforminInsulinAdmission_Source
time_in_hospital1.0000.3390.1870.463-0.014-0.0010.0930.2380.0280.0430.0250.1110.1370.0230.1140.0680.0500.0280.0000.0000.0100.0000.0940.045
num_lab_procedures0.3391.0000.0170.248-0.0240.0080.0410.1660.0170.0220.1690.0440.1530.0270.0720.0440.0370.0240.0000.0000.0090.0000.0890.218
num_procedures0.1870.0171.0000.349-0.024-0.047-0.0640.0630.0460.0650.1390.0540.0340.0240.0270.0320.0350.0000.0060.0000.0000.0060.0260.127
num_medications0.4630.2480.3491.0000.0750.0460.1010.2940.0370.0610.0830.0780.1330.0300.2430.1930.0660.0280.0390.0000.0230.0390.1710.063
number_outpatient-0.014-0.024-0.0240.0751.0000.1790.1570.1140.0000.0030.0140.0170.0000.0180.0160.0040.0290.0000.0000.0000.0000.0000.0200.013
number_emergency-0.0010.008-0.0470.0460.1791.0000.2230.0930.0000.0270.0090.0000.0150.0070.0150.0070.0290.0050.0000.0000.0190.0000.0210.027
number_inpatient0.0930.041-0.0640.1010.1570.2231.0000.1360.0080.0490.0200.0220.0820.0230.0180.0180.1340.0000.0000.0000.0030.0000.0540.032
number_diagnoses0.2380.1660.0630.2940.1140.0930.1361.0000.0000.1300.0800.0760.0520.1120.0580.0320.0900.0320.0000.0040.0100.0000.0950.122
gender0.0280.0170.0460.0370.0000.0000.0080.0001.0000.0790.0120.0660.0000.0320.0150.0160.0120.0010.0020.0000.0000.0020.0010.015
age0.0430.0220.0650.0610.0030.0270.0490.1300.0791.0000.0450.1260.1410.1830.0540.0410.0430.0380.0000.0000.0140.0000.0840.051
admission_type_id0.0250.1690.1390.0830.0140.0090.0200.0800.0120.0451.0000.0760.1260.0630.0290.0180.0450.0410.0000.0000.0130.0000.0440.504
discharge_disposition_id0.1110.0440.0540.0780.0170.0000.0220.0760.0660.1260.0761.0000.0490.0840.0930.0750.0910.0250.0000.0000.0040.0000.1020.082
max_glu_serum0.1370.1530.0340.1330.0000.0150.0820.0520.0000.1410.1260.0491.0000.3650.2490.1900.0680.0251.0001.0000.0191.0000.2200.151
A1Cresult0.0230.0270.0240.0300.0180.0070.0230.1120.0320.1830.0630.0840.3651.0000.1880.1830.0190.0270.0001.0000.0000.0000.1530.041
change0.1140.0720.0270.2430.0160.0150.0180.0580.0150.0540.0290.0930.2490.1881.0000.5050.0430.0950.0000.0000.0710.0000.6380.042
diabetesMed0.0680.0440.0320.1930.0040.0070.0180.0320.0160.0410.0180.0750.1900.1830.5051.0000.0580.0820.0000.0000.0540.0000.5800.005
readmitted0.0500.0370.0350.0660.0290.0290.1340.0900.0120.0430.0450.0910.0680.0190.0430.0581.0000.0140.0000.0000.0090.0000.0520.075
Meglitinides0.0280.0240.0000.0280.0000.0050.0000.0320.0010.0380.0410.0250.0250.0270.0950.0820.0141.0000.0000.0000.3880.0000.0170.028
Thiazolidinediones0.0000.0000.0060.0390.0000.0000.0000.0000.0020.0000.0000.0001.0000.0000.0000.0000.0000.0001.0000.0000.0000.7500.0000.000
Sulfonylureas0.0000.0000.0000.0000.0000.0000.0000.0040.0000.0000.0000.0001.0001.0000.0000.0000.0000.0000.0001.0000.0000.0000.0020.000
AG_Inhibitors0.0100.0090.0000.0230.0000.0190.0030.0100.0000.0140.0130.0040.0190.0000.0710.0540.0090.3880.0000.0001.0000.0000.0050.012
Metformin0.0000.0000.0060.0390.0000.0000.0000.0000.0020.0000.0000.0001.0000.0000.0000.0000.0000.0000.7500.0000.0001.0000.0000.000
Insulin0.0940.0890.0260.1710.0200.0210.0540.0950.0010.0840.0440.1020.2200.1530.6380.5800.0520.0170.0000.0020.0050.0001.0000.061
Admission_Source0.0450.2180.1270.0630.0130.0270.0320.1220.0150.0510.5040.0820.1510.0410.0420.0050.0750.0280.0000.0000.0120.0000.0611.000

Missing values

2023-11-05T11:13:44.522586image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-05T11:13:45.310579image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-11-05T11:13:45.812387image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

genderageadmission_type_iddischarge_disposition_idtime_in_hospitalnum_lab_proceduresnum_proceduresnum_medicationsnumber_outpatientnumber_emergencynumber_inpatientdiag_1diag_2diag_3number_diagnosesmax_glu_serumA1CresultchangediabetesMedreadmittedMeglitinidesThiazolidinedionesSulfonylureasAG_InhibitorsMetforminInsulinAdmission_Source
0Female[0-10)Missing99914101000250.83NaNNaN1NaNNaNNoNoNONoNoNoNoNoNoReferral
1Female[10-20)Emergency1359018000276250.012559NaNNaNChYes>30NoNoNoNoNoAdjustedEmerg_Dept
2Female[20-30)Emergency1211513201648250V276NaNNaNNoYesNONoNoNoNoNoNoEmerg_Dept
3Male[30-40)Emergency12441160008250.434037NaNNaNChYesNONoNoNoNoNoAdjustedEmerg_Dept
4Male[40-50)Emergency1151080001971572505NaNNaNChYesNONoNoNoNoNoSteadyEmerg_Dept
5Male[50-60)Urgent13316160004144112509NaNNaNNoYes>30NoNoNoNoNoSteadyReferral
6Male[60-70)Elective1470121000414411V457NaNNaNChYesNONoNoNoNoNoSteadyReferral
7Male[70-80)Emergency15730120004284922508NaNNaNNoYes>30NoNoNoNoNoNoEmerg_Dept
8Female[80-90)Urgent11368228000398427388NaNNaNChYesNONoNoNoNoNoSteadyTransfer
9Female[90-100)Elective312333180004341984868NaNNaNChYesNONoNoNoNoNoSteadyTransfer
genderageadmission_type_iddischarge_disposition_idtime_in_hospitalnum_lab_proceduresnum_proceduresnum_medicationsnumber_outpatientnumber_emergencynumber_inpatientdiag_1diag_2diag_3number_diagnosesmax_glu_serumA1CresultchangediabetesMedreadmittedMeglitinidesThiazolidinedionesSulfonylureasAG_InhibitorsMetforminInsulinAdmission_Source
101756Female[60-70)Emergency12466171119965854039NaNNaNNoYes>30NoNoNoNoNoSteadyEmerg_Dept
101757Female[70-80)Emergency15211160014915185119NaNNaNNoYesNONoNoNoNoNoSteadyEmerg_Dept
101758Female[80-90)Emergency157612201029283049NaNNaNChYesNONoNoNoNoNoAdjustedEmerg_Dept
101759Male[80-90)Emergency1110153004357842507NaNNaNChYesNONoNoNoNoNoAdjustedEmerg_Dept
101760Female[60-70)Emergency16451253123454384129NaNNaNChYes>30NoNoNoNoNoAdjustedEmerg_Dept
101761Male[70-80)Emergency3351016000250.132914589NaN>8ChYes>30NoNoNoNoNoAdjustedEmerg_Dept
101762Female[80-90)Emergency45333180015602767879NaNNaNNoYesNONoNoNoNoNoSteadyTransfer
101763Male[70-80)Emergency1153091003859029613NaNNaNChYesNONoNoNoNoNoAdjustedEmerg_Dept
101764Female[80-90)Urgent310452210019962859989NaNNaNChYesNONoNoNoNoNoAdjustedEmerg_Dept
101765Male[70-80)Emergency1613330005305307879NaNNaNNoNoNONoNoNoNoNoNoEmerg_Dept